Asian Language Parsing Evaluated by Hummingbird SearchServerTMat NTCIR-3
نویسنده
چکیده
Hummingbird submitted ranked result sets for the Chinese, Japanese and Korean Single Language Information Retrieval tracks of the Cross-Language Retrieval Task of the 3rd NII-NACSIS Test Collection for IR Systems Workshop (NTCIR-3). SearchServer 5.3’s segmenter for Asian text, compared to an overlapping n-gram approach, was found to modestly increase precision scores for Japanese, to have a neutral impact for Chinese, and to be detrimental for Korean. SearchServer’s option to case normalize Hiragana and Katakana n-grams increased precision substantially for one Japanese query and was of neutral impact for the others. Newline suppression was found to be of only minor benefit for n-gram parsing. Normalizing Han characters to Hangul had almost no effect on the Korean test collection.
منابع مشابه
CJK Experiments with Hummingbird SearchServerTM at NTCIR-5
Hummingbird submitted ranked result sets for the Chinese, Japanese and Korean Single Language Information Retrieval subtasks of the Cross-Lingual Information Retrieval Task of the 5th NII-NACSIS Test Collection for IR Systems Workshop (NTCIR-5). For short Chinese (title) queries, a decompounded wordbased approach produced higher (statistically significant) mean average precision and first relev...
متن کاملNatural Language Understanding, Semantic-based Information Retrieval and Knowledge Management
Natural language understanding (NLU) has been taken as one of the core fields, and the hardest one among others, of Artificial Intelligence. As such, it has been considered an impractical technology from the perspective of the real world application. However, the situation surrounding NLU has been changing rapidly. We have witnessed large knowledge resources being constructed, manually or autom...
متن کاملPreface of NTCIR-8
NTCIR-8 Meeting is where the groups who actively participated in one or more tasks set by NTCIR-8 report out their latest results obtained from the evaluation workshop. The NTCIR evaluation workshop series are designed to enhance research in information access technologies, including text retrieval, cross-language information access, question-answering, information extraction, text mining, etc....
متن کاملA Japanese-English Technical Lexicon for Translation and Language Research
In this paper we present a Japanese-English Bilingual lexicon of technical terms. The lexicon was derived from the first and second NTCIR evaluation collections for research into cross-language information retrieval for Asian languages. While it can be utilized for translation between Japanese and English, the lexicon is also suitable for language research and language engineering. Since it is ...
متن کاملNTT SMT System 2008 at NTCIR - 7 Taro Watanabe Hajime Tsukada
This paper describes NTT SMT System 2008 presented at the patent translation task (PAT-MT) in NTCIR-7. For PAT-MT, we submitted our strong baseline system faithfully following a hierarchical phrasebased statistical machine translation [2]. The hierarchical phrase-based SMT is based on a synchronousCFGs in which a paired source/target rules are synchronously applied starting from the initial sym...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002